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GENERAL INTRODUCTI ON ' • , 

-mm m m m nt), i.i - , *• 

Numerous strategies have been tfsed throughout the years to test 
cultural grou]^. This p^er grex^^ out of the need to find and use stand- 
ardized tests that v7ould* depict accurately the perf ormslnce of various 
cultural groups in America. 

^ In-order to make judgments about p"erformance, it Us wise to ekamine' 
th^ theoretical structure from which most of the existing tests were 
'developed. Accordingly, the paper traces the development of the Various 
strategies arid theoretical structures , explaining wky they^ hayfe met with 
limited success. Through a paradig^tic analysis of the literature, it 
identifies the existing testing pa3;adigm as a mo^ocultural one, and it 
relates the_ various efforts to produce a. culture-fair test. 

The paradigmatic ^alyt^is is extended to encoppass a proposition, 
based upon the coalescence of the scientific-* (theoretical a^d measurement) 
and policy contexts. The analysis suggests a procedure by which tes(;« 
can be developed and/or evaluated If they are, to depict accurately tt 



performance of varipus cultural groups . . ^ 



■ -1- 

PART I*' 

XHE MON.OCULTURAL TRADI lllON OF TESTING 



In this paper, "the testing tradit lion 'has been subjected to a 
paradigmatic analj^sis, which fs t^e examination of ^ kie nature and ► 
dimensions of a testing paradigm/\ It is shown that the study of 
cultural groups w^s accomplished by\tT^ application of the existing 
testin^g proce-ss. . ,» 

A ; DEVELOPMENT OF THE EXISTING PARADIGM % 

NATURE OF THE EXISTING TESTING PARADIGM , 
The contemj^orary testing paradigm is monocultural in nature, 
that is, it has relevance prl^marily for one dominant cultural group./ 
It is born of a tradition tl:\at hafe 'elevated testing to a position* 
of prestige and influence in tnfe (American way of life. It has beeo® 
accomplished througli several hisfcdtical, social and economic events. 
Suph events have included the growth of the melting pot concept and 
the emergence of the .criteria, for^^ltural group separation. ^' 



THE GROWTH OF THE "MELTING POT"* : ' • 

Th4 doncept of the "melting pot>" was greatly responsible for 
achieving the cul,t»aral homogeneity needed for' certain groups t?o be 
assimilated into the political, legal and social developments of 
American life. ■ ^^^^ \ V • ^ \ 

^ Historical precedents can explain the emergence off such it timely 
concept. The Anglo-Saxop cultural groups from^ northern and western 
Europe were heaVil/ represented in earlier populations who immigrated 
to this country. Eventually, the magnitude and the diversity of 
cultural groups were extetided to populations from southern and eastern 
Europe. * ^ ,\ 

^ k ■ ■ 

A paradigm co^isists of dimensions ot "sets" that have common 
postulates ^nd ,uni,f ormly ^acceptejd meanings that have been attributed 
to those postulates. ^ . , / 




/ 
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Millions of iipmigrants brought strong tendencies tox<7ard cultural . 
diversity which were incorporated into the American heritage, there 
was a.. need, however, to solidify these differing views iiito a common 
frame of reference if She American culture were to %t;hjbtand t^^ 
impact of the cross-cultural vatiefiesf of i>Q^etty| wealth, geography, 
religiosity and language. The structure for tfei^ fraitffe of reference' 
had been set by Anglo-Saxon cultural -tra^itioit^ fo^^^ was the ^ 

development 
;v These;; systems 

were 'resporiSib le fS\ a° mU^i^&^'ifeaii^ar^^^ of. ^^^J^"^^ ' ^ ' 

:pandlng institutions , aiid for motivating anid syntronizing the' 
sbecialization of tli6se lnstit;titltSns, whereby there could be a ' 
s^ep$i&LSLl^^sa^^^^^ cultutaii differences ."i The 

^absence of different;, .S^iiil treatment for tl^^^roups lessenea" the 
impact of. the cultural differences' ai^d |ei?mi|^4 the- groups to co- 
exist peacefully. The Wty of the 'gi^s was further solidified in 
the monocult||ral tradition by ;:he adoption of 'aij American version of . , 

English, speech. _ , ' 

• The maintenance of the monocultural tradition has resulted in 
'-the concept of the..'Wlting pot,'" a perspective that remained. a 

myth for non-European cultural, groups. The melting pot did not 
J at t^at time becomfe a reality even for some European Immigrants who 
• were non-English speaking. **\Quot a laws fo'r Europeans after World 

jgar I were more directly controlled.by the Federal Government rather A\ 
. than by^the individual spates. The Immigration of groups from ^ 
, southern and eastern' Europe was liinited by law, but not in practice. 
/ At first, neither the non-European nor the European groups were^ 
allowed to blend into the politically and' economically unified 
dominant group. Eventually , .however , the* latter did merge. 



.*Carl Wittke, "Historical Background: » Immigration Policy. Prior 
to World War I," ftTrm^.in Benjamin M. Ziegler '(ed.) Irwrrlgvat'ion: An 
Amevican Dilemma, D.G. Heath and Co., Boston, 1953, pp. 1-10. 



Wittke, 'ioc?. cit.^ p. 2. 



J''"ziegler, Irmigmtian, ^Boston: Heath, 1953. 
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* The reason for the difference in treatment between the, tx<ro groups . 
Xijas that the non-Europeans xijere subjected to the enactment of special 
'immigration laxvs/a tradition of slavery, and the 9olonial occupation 
^ of Indian land by the Europeans. ' - * 

jEn »a cultural sense, these non-European groups "have been separ-^ 
ate|:' f rom the "dominant" European group by. the "entry status"* accoirded 
thelpas they became a part of the Am^ricax/ scene. This primary dis- 
tinction of entry. status separates the "dominant" and "minority" cul- 
tures, the latter being defined, ^s Blacks',/ Chicanos , Native-Americans 
and Oriental/Asiatics, and present-day population statistics sup^port ' 
this distinction. An ijateresting corollary is the proportion of 
these groups which is b,^low the Official poverty level in the United . 
States. 

Because cultural heterogeneity and low socio-economic status^ 
have .compounded tfle assimilation process into the "dominailt" American 
culture for these groups, a type of cultural disenf ranchisemeiit 
'these minority feroups has resulted. • » • / 

. • f ■ ' . . ■ ■ • . ' 

DIMENSIONS OF THE EXISTING P ARADIGM , . 

Scientific history has caused three dimensions or "sets" to 
converge in^the testihg process: . the theoretical, meithofiological ^ 
and functional. The as;^mptions underlyixig these cy.mensions were ^ 
predicted upon and/cn/inspirdd hy '^he European tradition and were 
incorporated into )me rapidly changing American society. 
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Such a status can be exemplified by African slaves, Mexican-American 
migrant f a^tii i^orkers , Puerto Ri can immigrants from protectorates in col- 
onial era> Native Americans who seek initiation into the mainstream 
'from the re^servations , Japanese Americans in West coast concentration 
camps dtir/ng World War II, Chinese Americans in defined upper social 
and economic "statuses rather than status as' railroad laborers , (Cxrca 
1^6), 

icenni^l Census data 1970: Of .87.5% of the "dominant" ^roup 
5.3%/is below povfertj^ line; Minoritjr cultural groups: 11.1% (Bla'cks, 
x^itm 29.9% below poverty^; Other groups make up 1.4% of the * populations 
Osach separate group i^ake up less than .1% by itself). The proportion 
)elow poverty line include: Filipino 11, 5% ; American Indians 33.3%-; 
Spanish surname 23.4%; Spanish origin 21.1%; Japanese 6.4%; Chinese 

io;3%. ^ ' - ' > 

"Sets" are those dimensions of the testing paradigm which are 1 
Mutually exclusive ill perforaauce, yet interrelate to produce the ^' 



testing process 
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THEORETICAL DIMENSION - ^ 

The primary assumption underlying the theoretical davelopment^.of 
testing x^as the heed to assume that human traits were "Inborn" or ' 
fixed 'and that these traits could be observed ^in physical character- 
istics and behaviors of human beings. This- assumption led to an ifi- 
terest in the possibility, of ^a psychological/model which x^ould be 
based on the heed to characterize the distribution and variability of 
individual differences. Inspired by the work of Galton,. Dan^in and 
otKei^, the assumption led to the logical couclusion that quantitative 
measurement eould be devised. Thus, it was conceptualized that 
highly, developed mental traits could be characterized as a^set of >^ 
••intelligent behaviors>" or '^on-intelligent behaviors"; and that^ 
these categories' could be applied to "bright", and "dull" ind^iduals / 
respectively. Thus, much of the theoretical development iii^ testing / 
came^ to be founded in intelligence testing. / 
Subsequent assumptions have tended Only to .elaborate on the / 
colUctiveness and intricateness of the traits. The first such ^ffoyt 
was recorded by Spearman (1927), wh^se "general factor" ("g" factor)/ 
^was fpund to be present in all stand^rdiged tests of^int^lligence. 



and i,t was the "g" factor that allowed the miBasurement of complex 
mental abilities. ' Late.r, Thurstpne (1938) developed a multiple- 
fac^oi' analysis, illustrating that St»eaman's "g" factor cpuld be 
defined into a .number ot primary' abilities or tests, such as verbal 
comprehension, space, reasoning and others. This discovery really 
provided greater stability for the structure of the "g" f^tor and 
this stability was not really questioned until^the work of Cattell. 

The investiglitidns of Cattell in the 1940 's expanded the omnibus 
"g" factor theory, into a two-factor theory, thereby introducing 
another determinant of "intelligent behaviors. He distinguished 
between a fluid factor (g^) which is' independent of cultliral and , 
educational acqulsitioris , and a crystallized factor (g^) which is 
primarily dependent on cultural, Ijinowledge. and edjicational attainment. 



0- 
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Cattail, Raymond (19*68), p. 58, 
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Several investigators have been particularly concerned wittj, tlje 
structure of the factors underlying intelligent behaviors and have 
de^i^eloped models for these factors. Burt <1949), Vernon (1960), and 
^Hiimphrpys (196^)^ have been particularly interested in hierarchical^ 
levels of factors. Guilford (1'956) ha^ focused on the "structure" of, 
intelligent behaviors. His woiffc has revealed five kinds of intellect: 
memory, cognition, convergent thinking, divergent thinking, and - 
evaluation, that can be applied to three types of content: fie4aral, . 
structural and conceptual. As one can^conclude, he has been inst^u- 
\iitental'in constructing a multirdimensional model that can facilitate 
any number of cofebinations including intellect and content. 

• I<ti summaigri the- terms^"genotype" and ."phenotype" can be used to - 
.distinguish the theoretical assnmptlona that have been offered to 
designate those factors considered determinants of ^'intelligent 
behavior". Genotype has b^en defined to mean those traits that have 
been mainly inheritable, while -phenotype refers t& those traits that 
have modified and ^molded by environmental influences (Dobziiansky , 
1951). ^Vernon (1965) 3t.ates that* still another factor should be 
considered when .examining the different abilities measurecfby a 
particular intelligence test. He contends that each test measures 
a different sefof abilities,'^ and therefore it. adds a test-induced 
factor to the theoretically derived' factors of intelligence. 



METHODOLOGICAL DIMENSION 



The discovery that human behavior could .be described in statistical 
statements was fostered to a' great extent by such pioneers as.Quelelet 
and^Galton. Gal ton was influenced by the mathematical formulations 
of.Quelelet, and as a consequence, he devised many measuring instru- 
ments in an attempt to describe sensory thresholds - of individuals. X- 

Along with the influence of Darwin's writings' about selective 
breeding. Gal ton formulated twd-laws that have had a strong impact 
on ;ipany of the methodological procedures used in present day test c 
construction and analysis of test, data.- Hi^rfe^h (1973) describes 
these two laws as "Law of Anc^tral Heritage V and "Law ef Regression". 
Galton's laws imi4ly strong assumptions ab^ut normative population 



and aboufthe variability of .those populations. \ 

" Binet exten^ Galton's concern for Sensory measurement to the _ 
measurement of' higher mental abilities, and therefore, he cievised^ ^ 
tests involving'' complex mental tasks. Terman (jlgi-l) stdhdardized •> - 
these tasks and adapted* other tasks to produce the. Stanf ord-Binet . 
Intelligen,ce Scale for the Americaii population. The methodological ' 
procedures were set in motion, providing the foundation; fpr other 
kinds of V standardized tests . sutchl^s pchieyeirfe|it . 4pti/tvtdl^.and 4ia^r . x 

nbstic tests. ^ : ^ ' • . ' ^ ' . 

In arder to under,stan4 most ;of ^tfie data' collect^d^rom tbe^'us^' 6f 
inteMM^ence, aptitude or ach'i6vement tests, Kerltnger (1964) re-- 
quirfes the adherence to certaiii/meaguremetat postulates. . ^ 

Vln thjeojy, the measurement postulate' th^t governs v the kind of 
dat^hatHs; generated by -intelligence^ aptitude or achievement < 
tests requires the- use of- ordijial scaling techniques,'- -In other , 
words, one' s test scores^ ca^ be given rank or^yaluesV Ordinal ' 
scale data presupposed that 'there is 'no absolute ^ero point .thaf 
caA be designated, nor can. one, assume that there are. equal empirical 
distances between the Scores. " _ • 

vAs characteristic .has proven t.o be a source- of conflict in 
present-day. testing practice. The conflict arises wheti most psy- 
chological scales, which are essentially ordinal, must be assume^ 
to hAve equal, interval, a practice Weh has occurred primarily f 
^out of the necessity to use the most'rgbust statistical tools avail- 
able. However, there is always the recurring problem of being' able 
■to' adjust the mechanical procedures' enough. to assure equality of ^ 
interval, without the Expense of loss of interpretab^lity of tlie 
data. It has become" a significant problem, especially when certain',, 
psychological* scales have been applied to diverse population groups. 
' It is in- "the instance of applying interval-scaling techniques to 
ordinal scale data of very- lie tefogeneous populations, where the 
serious errors in. iAt^rpretation may be too costly to be overlooked.. 
At this point, it may be advantageous t;0 look at the theory /ptactice 

''^ *Kerlinger. Fred W. . FoimdaHorW of Behavorial 'Research,, Holt. 
Rinehart and Winston, Inc.. 1964. Chapter 30. p. 420. " " > - 



^ ♦ 



conflict to assess those instances when the interpretation «of the ' 

( aata is gcosslv' distorted. ' , ^ 

: • ' ■ - , - • » 

The methodological procedure^ or psycfiometric applications ii^*' ' 

\ " ' ' ■' • ( ' ' * * 

testing have been ^enhanceH through the tremendous ^id of the computer. 

Test format, item selection, complex scoring and the use of miilti- 

,'^■4^ yar|.ate and factor analyses have coiitriljuted to the range and depth 

4 o£ the investigation for whieh testing instruments have fceen designed*." 

f ' ' The techno^logical^boom may be one of the jr^asonSl^ for the general 

,re]^uctance to re^x^mine and possibly revise earlier postulates and 

subsequexijt procedures of testing.^' 

i/ . ^ 

'.. ' ' ♦ ° ' ' . » * 

FUNCTIONAL DIMENSION - 
• ■/. ^ • - b . . . ' ■ , , 

The most important .assumption underlying the functional dimension 

^ ' t » \ ' ' ' • 

* ^ « is that tests can predict/ huma^n benavior to,^lected crit^eria. ^ Un- 

^^fortunately, pi?o|)lemS havd'^ arisen es/^eciaXly in jtbe field of education 

, " becau^p: -~(a) the selected ciriteria* were imp^^sed equally upon all f> 

^ ^ ^segments ^cHE the population,; ,(b) there was an^^^in creasing deman*d for' 

mass testing^ i where the rOjle of testing; prediction became intricately 

linked with tfee role of testing ^in^ placement j (c) a conflict arose 

N. between ^the needs and interests af the individual, the groups anid 

the :jiinstitutlon; -and (d) tasting came to be used as a policymaking 

Y^*' tool. • \ \ \ ? . i*' , r 

V • .1 • The doctrine that the masses have" a right to equal educational/ 

» • . opportunity has been accept'ed generally in recent Western history, 
, and that criterion assumes that even the masses are ^ an equal 
"s socio-economic fboting. However, this has not always been held to a 
be true. «Prior to the tn^d-17th century, public education in England 
was meant for a relatively poor group of people, i.e. , merciffint 
^ families landlords. However, after this period, ^' the financial^ 
& \ burden ©J^ sqffooling shifted* educational attainment to the^ aristo- 

t cracy* This period of educational liistory is of particular interest 
because it set the momentw for modeling the "curricula and in'struq- 
tional methods that enabled the /pper classes to survive in \their ow^ 



, "Hieronymus <?(1971) discusses the use of technology in today s» 

" ■ -testing (p. 59)'/ . • * 
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fenvitonment. However/in most of Europe, the masses regained their 
tight to education 'through the events brought about^y the Reformation 
■ SubsequlSt' historical periods, bofh.in the United States aiid ^ 
- Vestern Europe, have been responsive to J;he rights of a person to . . 
Educational opportunity, but the educational curricula and the methods 
of teaching' traViitionallj have retained the environmental leai^ng 
" charactfetistifs mostWtable for the capabilitifes of it^dlviduils 
found in the Afiper -Social classes. ^ , 

>. In gpifce of the work of John Dewey* where he extolled the virtues 
.. of th,€fc scienific method for delineating classroom experiences, there • 
has been litfle overall" change in tke nature of classroom environmental 
learning ebaiacteri sties . Even thoAgh it has been found in coigttless 
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L , especially in the compensatorji education programs of 



that the application of the same criteria to different 




ireceAt studi 
the 1960's, 

cultural griups may not be appropriate in , view of the different 

educational experiences of various groups. 

Thi role of testing in prediction and placement has its origi^ 
in the Industrial Revolution, about 1830, and in World War I. As 
American business and' industry became- concerned about tR^us^ of 
scientific inagement principles, the educational system became the 
primW social institution that, could effectively sort and place 
^the diversity of talent needed for a growing technological society. - 
. ■ Pressures to proAce '"objective" testing instruments, for military, 
use in both World Wars led to the creation of the U.S. Army's Alfa 
and Beta Examifi'ations. It was reasoned that a person' s^ capability 
and potential could be measured and categorized to serve the interests 
of the individual and the heeds of the Army. T^he Army .tests 'bad as 
their "criteria "a high degree of reliability and a.moderate degree 
of validity."** It can be said^ that the highly specific rationale 
for mass testing that emerged in the context of the war era continues 
to dominate the policy and practice of testing even today. 

*John Dewey, cited in Tesconi and. Morris (1972), p. 150. 



** 



Guilford (1946) , p. 427. 
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^ The ranking of individuals has bpen in response to a given norm, 
^ which happens to be biased towards that ^ the dominant' group. There- 
fore, jainoritjLes can be misplaced because ^1 levels of- minority 
groups- usually have not been represented In the establishment of the 
norm^ 

VJhen testing is Viex^red ^rom the perspective of either the indivi- 
dual, the group or the institution, there must be a comprosaise in 
the assertion of the needp and interests of each. * 

Tests have been used, primarily to satisfy an' individual's needs, 
.such as guidandfe into an appropriate career* opportunity and the identi- 
fication of strengths and weaknesses in given Object areas. Manning 
(1968) defines tests which' are' of concern €o the individual as having 
"guidance", "elective" and "educative" functions. 

Tes'ts also have been used within groups to detscrlbe comparative - 

- • ' . 

relationships of^^ficj^viduals to specific criteria, such as ethnic/ 
racial, social cla^s, sex and age distinct i,ons between groups. . Per- 
haps this is what Manning meant when he referred to the "societal 
function" of testing,./ Suqh a use of tests has. the effect of denoting 
societal values and promoting the homogeneity of societal bias.^ 

There are at least three distinct areas in testing which hav^ 
b^ecome prominent for institutions. They are: 

o In social institutions (as in the determination of 
human performance in education and effectiveness of 
educational programs). k. , 

o In econopd.c institutions (as in the selection and 
promotion of personnel in *^the business sector). 

o In politico-legal institutions '(as in the bargaining ^ 
for visibility by 'minorities in legislatures and 1 
courts). * ' 6 



Please refer to extensive literature summaries provided by the 
following studies: Eells et al. , 1951; A^stasi, 1958 a and b; Lesser, 
Fifer -and Gordon, 1964; Miller 'and Dreger , 1973 (ethnic, racial and 
social differences); German and Tyler, 1954, Maccoby, 19-66 and Kimura, 
1973 (sex dif ferenQes) ; and luhelder and Piaget, 1964 and Kamii, 1971 
(developmental differences) . 

O Manning (1968) , 4». 260. 
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^' * Tests, in the first two instances,, are used primarily to screen 
and place individuals for specific purposes as well as substantiate 
necessary policy decikons so that the institutional effectiveness can. 
» be insured-, and to seleet and promote individuals in employment. . 

Manning describes -these functions as th%. "prescriptive,'' "evaluative , '' 
and '.'selective/distributive" functions of testing, respectively. 
^ .Attempts- ^ave been made to resolVe the individual/group/institu- 
tional 'conf lief in the legal and legislative arenas, and these man- 
^-^---^CSr^ave been presented to administrators and researchers for, 
impleaientation, . ' ' . . 

Tes'ying been, used as an invaluable aSset to 'the educational^ 
policyinaker. An effective ^olicymalcing capability requires 'that a 
test provide the, administrator with t#§t scores whi-ch display: , 

(1) * Reasonable psychometric stability in the theoreti 

and operationally determined aspects of reliability 
and validity. 

(2) Relevant interpretive framework to provide equity in / 
' treatment of-^jroup^, pairpimony in allocation of funds 

' and continuity oi compatibility x^ith prior research 
data base. 

9 

The problem areas foun'd within the functional dimensions became , ' 
critical i7hen legal and legislative controls over the use of tests 
began to dictate the policy interpretations and implementation 6f 
the concept of equal educational opportunity. 

The court rulings on desegregation in^l954 began the climate 
for renewed discussion on the subject of equal educational opportunity. 
The Civil Rights legislation of 1964*' intensified that confem by 
providing,* among other directives," a mandate- to respbn^ to the use 
of testing in employee selection. , w . - 

J Kirp (19^74) discusses the fact that the constitutional rights 



<5 




*Huff (1974), p. 246-269. 
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of students may be infringed l>y the testing process. It h^s been 
primarily the issue of accessibility to educational ^facilities that 
may lead to. latei" job opportunifcaes that had led (^ourts to pass 
judgments on the use of testing.^ " • 

Unfortunately, there are '^oo many nuances of interpretatioi^fs to 
h\e found from a reading of the lax/f and. probleiG^ arise in attempting' , 
to translate th^ mandates^nto policy decisioti. Scyeral federal 
agencies are in the prodess of deciphering. the meanings of statutes 
in an attempt to prepate guidelines ^for testing, primarily in employee 
selection. Flaligher (1974) highlights some, of the ^problems inherent ^ 
in recent. Sessions , and thfe 'practical consequences of the legi'sly^'ivi^ 
mandates and court litigation in tenjis of <policy l^mplementation. 



'Goodlad (197J0^ discusses the changing context of equ^ educational 
opportunitzy , pointing out the -difference betwe'en "quantity aiid avai^ 
'ability." He asks the question: "i..hox^ mush constitutes a minimum^ 
(or later adequate) core and how easy is it to^ain access to the • 
system?" He goes-on .to explain that it is the last question that ^ 
"; . . provides a breeding ground for questions about ^ual educational 

opportunity ; ^f or example, to X';hat extent and on x\rhat basis is' access 

t 

difficult for some individuals and groups?" 

He distinguishes betx^een the terms ""educational opportunity" and 
"equal educational opportunity." It is evident that befoi;e court lig- 
ation, the historical goal of "equal educational .opportunity" was iri 
many respects, realized only with respect to the dominant cultural 
group. It satisfied their needs, their aspirations and the ultimate 
aim "of" drax^ing th^ nation under a common educational standard. 

Even though 4:t:here x-zere special classes^ for- those immigrants who 

'^fi ' ' '■ 

neerfed,to learn EtigUsh, essentially there x<rere no special programs 

^ ■ i ■ 

in public schooling for academic tutelage (Brickman and Lehrer, 19720. 

This is an interesting aspect of that educational eta, in viev/ of the 

number of compensatory educational programs that presently exist'* as a 

res^ult of several legislative mandates for poor and minority groups.. 

ft * "4 

Kirp, (1974) , p. 7-52. - " ^ 

Goodlad (1971) , p. 4 / 



Ibid.^ p. 4, 
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^ B;, JtE STUDY OF QULTURAL- GROUPS \ ^ 

The existing pVadigm^^f testing was applied initially to 
studies of group differenc'es and, subsequently, to studies attempt- 
^ ing to eliminate group differencga. ' ^ 

lyfelFINING. GROUP DIFFERENCES " 

A preponderance of the literature doQument^g group testing has 
come from studies of race comparisons which depipt Black and White 
differences in inte4.1igence and achievement (Porteus and Bab cock," 
1926j' Klineberg, i935; Shuey, 1958; Miller and Dreger, 1973) . " j? 

Miller and tite^er describe the historical sequence in which the , . 

comparative research on race has occurred: / . 

I ^ • - / ' \ • * : 

' '*Most Qf thjb comparative research on race tias hfeem done ^ ^ 

within, a natmatiye framex^ork, with the behavibt of whites 
. being the \iorm for x^hich blacks deviate. Earlier research * • . 
^ was directed primarily at attempts to measure' and describe * * ^ 

these , deviations. . . More recently, differences between ^ 
'the r^ces were interpreted v/ithin. a social j^idthoiogy frame- 
work... Spread 1;hroughout this revi^Xij is evi'denc^ of a 
- _ turn to another x^ray of looking at difference^^.^ We noj^ 

recognize that in spite of shared values ,/ tl^ier.e ar^ ' 
V number of very real cultural differences ^tween "blacks 

'r and whites, and that these differences canridt be equated 

with infeirioJ^ity as they have been in the past."* 

■ ' 

\ « Several studies concerili'^g other cultural groups are dispersed 
. throughout the literature. Such studies include Spanish- speaking 
^norij^ies (Anastasi and deJesus, 1953'; Anas t as i and Cordova, 'l953 ; 
Zirkel, 1972) , Oriental/Asiatic groups (Porteus, 1939; Lesser., Fifer 
and Clarlc, 1964) and Indiaii-Ameridan groups (Klineberg, 1929'; ( 
Havighur^, Gunther and Pratt, 1946; Anast^si, 1958a). 

Most of th^se studies have revealed statistical results that 
show the mean^avferage resporrses of most minority groups to be below ^ 
the mean average response of the compared dominant group. As a 




Miller an<!;^breger^ ('1973),' p. 1. 
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result of that statistical benchmark, there has been a continued 
proliferation of , studies identifying specific cuJtuAl antecedents 
that may be responsible f^pr^ differences ^betx^een groups , -especially 
as 'the antiBcedents are related to petformance on mental ability 
tests. These cultural antecedents have included socio-feconomic 
status'^and mobfllity^ family background and" child rearing practices, 
. V -rural-urban geographic location, segregated-desegregated school 
environments, and others. • A grfeat deal of interest has centered 
around thp study of these cultural influences and their consequences 
on characteristic patterns of learning ability among these groups, 
as well as the "mean peff ormance levels on school achievement tests,. 
4 ^L^Sser, Fifer and' Clark (4.964) , .have be^,. concerned with learning 

patterns among' varfojos groups and stressed the fact that certain 

groups may have a greater advantage to learning if different feam- 

■■ . '■ ^ ' ' ' ' i » 

» ing modalities , were examined. " ' • 

. ^ * ' » . ./ . . ■ 

- Dreger (1973)^ arid-L'Abat.e et.al. (19^3) provide literature • 
« summaries on*mo«t of the signi^icapt comparative research on intell- 
ectiial functioning and educational achievement. Another comprehensive 
^ study includes the major Equality of Education OptKiartunity Survey 
^ (EEOS) , a national^t^dy conducted by Coleman et al. (1966). Other 
studieg have^been made l^y Hosteller and Moynihan (1972) and Jencks 
* et al. (1972). These authors have been instrumental in analyzing^ 

massive amounts of data; revealing sef^eral determinants of educational 
inequalitJpr. With respect to educational achievement and standardized 
testing, these data suggest that there is la definite achievement gap 
',vii >etw^en the "dominant" group and "minority" groups and that this gap 



x^idens as these groups move fihrougfischool. 



gTi'^sch 

Levine (1972) states that more than 75 ^ percent of pupils, parti- 
cularly lox^-income students, are "at least two years below the 
national avera'ge in reading by the time they re^ch the seventh or * 
eighth grades." Majeske (1969) discusses at least^ three other 
types of achievement test score performances for different racial-ethnic. 



Levine in ^Brickman and lehrer (19 /2>, p. 42. 

2 ;V ' George Mayeske (1969) Technical Paper No.' 1 (Office of Program 
Planning and Evaluation) . ' * ^ 

ERJC ■ ?i . 18 . , 



regional and socio-economic groups. He states: .Verbal ability, shows 
the characteristic decremental learning curve over grades><jjhile Reading 
Comprehension is almost linear. For all raciss^, mathematics achievement 
appears* to approach a plSteau much earlier than other subjects, with 
the Negro! students showing relatively little progress beyond. the 9th 

grade. " <^ * 

#* - 

* ^ < . . ELIMINATING GROUP DIFFERENCES ^ 



Studies attempting, to eliminate group differences have thejtris theor- 
etical origins in the numerous efforts to -approach' the concept of "culture 
fairness". Each theoretical proposition soon was followed by. empirical 
"studies on strategies for reducing cultural bi$s , and the idea of fair- 
ness" towards -groups has been reinforced b'y political, legal and kdmin- 
istrative actions, ^s a result, certain pedagogical implications hive 



been created. 

] 



APf'ROACHINig T HE CONCEPT OF "CULTURAL FAIRMESS" . • • / - 

Th^ concept of culture-fair te'sting has been in existence since 1940, 
when it ^7as introduced theoretically jby Raymond Cattell. However, the 
reality bf the need for a culture-fair testing perspective also was 
recognized by several previous investigators who supported an adjustment 
. in the interpretations of test scores when applied to various cultural 
groups. (Klineberg, 1929; Daniel, 1932). 

Cattell defined a culture=fair test as having spatial reasoning and 
.numerical test ; components , emphasizing the non-verbal aspect of mental 
ability. Previously , it had been held that thes5\components were not • 
primarily influenced by one's cultur^backgroundio^ educational attain- 
ment; and, therefore, the test' items were considerjed culturally-fair. . 
Cattell' s comparative results suggested that HT^ultirire-f air test could 
be used cros^-culturally, as well as within subcultures and social classes 

Other investigatory hav^' been influenced by such testing procedures 
and have capitalized, on the use of perceptual forms as ^he non-verbal 
component to be used in culture-fair tests. (Raven, 1956; Porteus 1950). 



Ibid. , p. 18, 
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Subsequent research activity^ln the area of culture^fair testing ! 
has been* extensive. Several investigators have'tried, since Cattell/. 
to contribute to the empirical definition of the concept by offering 
fine shades of meaning/ including the terms "cultute-^ree" and "status- 
free" or by recommending either the "fair ^se" or "no use" of tests. 

Culture-Free 

Thi teirm "culture-free" emphasized the process of selecting test 
iteps ;that would have little or no cultural loading. Davis and Elles 
(4.^954) attemp^ted to' produce a culture-free test by reducing the verbal 
components of test^ through fehe use of pictifres' depicting common acti- 
vities found at all levels of the American society. This test Is ixm 

non- functional. Other culture-^f ree test constructionss;were attempted, 

* r ^ ' ' ^ 

but^ost have not enjoyed any success because they did not correlate with 
other te.sts (i.e^, did not have concurrent validity), nor ^ere they uge- 
ful i^ predicting to some commonly used criteria. • ' 

At least two explanations have been given for their failure. oThey 
are: a) It is difficult or impossible *to create a relevant test that 
is not culturally-loaded so as to satisfy many of t^h^required uses of 
tests;' and b) By fai^ng^to chan^ both the cultural loading of the test 
and the criteria to x^hich the ^^st predict, the concept of "culture-free"^^ 

was non-functional. 

* ' • * " ' ' ' 

Statu6-Fair . . ^ 

* .'1 

• Jensen (1968) has suggestedf^that the term "status--f air" be used in 
the place of thd term, "culture-f ^Ir!' . He believed that the latter term 
should be usfed as an anthropological term, one which xi/ould invite dis- 
cussion of truly cross-cultural testing between two or more distinct ^ 
c.ultuires* Hox^rever, he feels that present discussions in the United States 

are centered around social class and ethnic differences within a national 

o 

culture, and therefore they should be treated' in that context. . 

It can be inferred from his writings tnat h§ .beli-eves that testiijg 
was originally designed from a European, upper-class educational traditionj 
and that the testing format and content, especially intelligence testing,, 
have always had a built-in class bias, although not necessarily a built-^in^ 

- ^' ■ 
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cMtural.bias . 



In illustrating this iioint, the Anrerican society was seen as a 
single national culture with only distinct cEas^ and ethnic differences- 
Therefore » culture-fair was deemed an inappropriate label that should irl 
fact be ^'status-£air"\ In his discuss^oi^ oi criter^ia for establishing 
"status-fairness",. he state^ that for a test to be judged as vbeing fair, 
it must be: . ^ • >^ ^ 

; . . ^ ' ■ . ' . • : 

^ o capable of revealing status differences where such differences 
are due to genetic factors as well as cultural factors.; 
o capable of having preidictive validity ,*'whereby the test is 
biased in favor of btie group, over" another ; - . . 

- o capable of reveal^itig^ lower environmental correlations' to testi \^ 



scorgs ; j ■ . j 

capable of showing /res is tence to practice gain and minimuiji 




fer across/eq\iival.ent fofms of tests. 

■ \ 

' \^ ^ ' . . . 

Empirical research studies have substantiated the importance pf the 
socio-economic stat^tis (SES) variable in testing, especially in areas of 
intelligence and educational achievement. Unfortunately, however, phe 
differentiation and Control o^ other antecedent factors besides SES, such ^ 
as ethnicity, sex and demographic characteristics, have characterized most 
of the studies as* emphasizing descriptive methbdology instead of experi-^ 
mental methodology.* The fe^er approach has limited much of the potential 
for generalizability in data involving the study of SES. There has also ^ 
been coi^siderable concern with the definition of social class indexes 
between diverse cultural groups. ^ • 

Culturally-Optimum ' . ' 

Darlington (1971) * uses the concept of "cultural op t imali ty "• instead , 
'of the concept -bf "cultutal-f airness" . ^He divides the use of the term ^ 
'into two components:' a) "a subjective, policy-level question concerning 



* 



L'Abate, Oslin, Stone (1973) Comparative 'studies of Blacks andVIhites, 
Darlington (1^71) , p. 79 . . 
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the optimum balance betx\reen criterion performance and cultural factors 
and b) "a purely empirical question conoeiming the test's correlation 
with the culture-modified criterion v^iable and x^rhether that correlation 
can be raised." Hfe explains that the concept? of culture-fair implies i 
twee, conflicting assumptions used in test construction and selection; * 
a) the maximizing of test validity and b) the minimizing qf the test's 
discriminationVagainst certain cultural groups. 

These two qfonf^icting goals traditionally have baen tolerated by 
constructing tests With a high degree of reliability (discrimination 
between groups)^ and a lesser degree of validity. The coijce*pt of cultural 
fairness implies ""a relative balance in the s^ttainment of the two goals, ^ 
and that a mechanical advantage* of either sets up a critical imbalance.; 
and ^ut^ually contradictory definitions'" when applied to the 'concept of 
"cultural-f aiamess^'* Darlington concludes that the choice of the priority 
ol* goals is k policy-level decision. Only after ^that decision has been 
made can ,there be a .psychometric 'procedure resulting in the construction 
of a "culturally-optimum" test. * ^ ^ 

I Research studies concerned with th6 psychometric constraints on 
predictive models of testing as they have been applied to the concept 'of 
-cultural-fairness have been numerous (Cleary, 1968; Linn and Werts, 197J.; 
Thomdike, 1971). The most comprehensive research study on the differing 
value perspectives of test prediction and* th^ir psychometric implications 
tjas.been completed by Cole (1972). She lists all of the models to date 
that deal with the defftiition of cultpre-f airnes^ or the concept of cul- 
tural optimality in the selection of minority group members for employment 
or college programs. In seeking a practical applicatiofl for the Darling- 
ton concept. Cole relates it to the problems of ^employee selection and 
college admission, but avoids a discussi'on "bf its application in e&rly 
and elementary education. 

The weajcness in the Darlington concept of "Cultural optimality". seems 
to be that it skirted the theoretical issued of reliability and validity 
when dealing with different cultural groups . 



Qole (1972); List^ix models : quota, i;egres s ion, employer , Darling- 
ton, Thomdike, equal opportunity models. 
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F^it Use 



- Thorndike relates the fairness of a test to its "fair use" 

^ 



As 



can be n'oted, he does not restrict himself to the term "cultuifar in ^ 
his 'definition. He suggests the follow^g: 

"If one acknoiJl^dges that differences in average'test per-formance 
' ' may exist between population A and B, t^ien a Judgment on test- . 
fairness must rest on the inferences that are made from the. test 
rather than on a- cojiparison of mean.^cores in: the two populatioi^s . 



\t can be concluded from Thorndite that "failrciess" can be-> approached 
on a conceptual.basis-if W assvme that a significant relationshy exists 
within a group', i-V.^ betiieen a test'and its criterion variable;/ The b|sis 
for inferences of whether the test was fair or unfaix/jAerefore, woul,d . . 
be in a comparison of the pattern oj^relationships between the two groups. 

•In examining the literature on "culture-fair" testing it is within 
the conceptual analysis of the term "fair use'' of tests that a paradig- 
matic mode" of, analysis of "culture-fair" emerges. The mode of analysis 
that presents itself is kno;m as "equivalence", and this term will be 
discussed more fully'in Part II of this papek. 



1 



No Use. V ^ 

' . 

There have been. several calls for a moratorium dn testing. ' The Asso- 
ciation of 'Black Psychologists called "fbr a moratorium on "the repeated 
'abuse and misuse of the so-called conventional psychological tests", as" 
they are "unfair' and improperly classify Black children." The Human ^ 
Relations Conference .of the National Education Association called for a 
stop to the school' teptink bf minopitie^.*** sime state legislatures, also, 
such as the California Assembly, have been sensitive to the discriminatory 

t " • ■ 

effects of testing. ^ V 
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Thorndike (1971), p. 63. 



and Individual 



^ "'^Williams, R: L. "Black Pride, Academic Relevance,- 

■ Achievement," fhe Coilnseling Psyohologist\^^ol. 2, 'No. I, 1970, p. 18-22. 

***National Education Association, .Conference Report on Testing, 1972. 
Vrcer, Jane- (1974a), p. 138-139. " » . 



^ THE REDUCTION OF CULTURAL BIAS . ' /' 

the^wbple, results repor-ted^froni studies 'of group differences 
provided very depressing fdrecas^ts for the educational futures of most 
ofijthe cultural groups that had been' investigated. Even though some - 
studiesr documented :^n great 4ietail; ;i^lie prejsence of culttire-Speciflc in-^ 
f ormatxon in<> test^ and the need to eliminate such inf onnatloni'XEells , V 
--1951; Dajls and 'Eells , 1953) , the shift toward the eliminajjiqn of /culf ure- 
specific information Vas very gradual. It v^s not lintill^bout two dsfcades 
later when the theoretical and empirical implications of ' thiB sttady Qf° 
, test bias became an '^actuality. - At -that time, the app:arent obliqtilty of ' 
test results Of the ifiinority glroup from the test results of the dominant 
,group^ became less meaningful when the testing instwineut 'itself , was brought 
^into qu^^stion as being biased. • \ 

At least three sources of bias have been studied in exist ing sj^and- 
ardized tests. They are predictive, item and test taking biases . t. 

predictive bias initially w^s defined to mean that the mathematical 
modej^ used to explain the behavior of the data ptedictefl more acc^urately 
for one ^roup than for another group.. Jensen (19)68, p,.78) states: ^ 

If a test has different pre4ictlv,e validities^or different - . ' , 
groups iii t^e'- population and these differences cannot be" at- • 
tributed to differences in variWce on the^ test or the cri- 
\^ terion, it is iikel^ that the test is 4?iased in favof of some\\^\ 

groups and not otherte. o * \ i 

, • \ ■ , ■■ - ' , . 

Seyer^al investigators have examined the validity coefficients to see if; 
they were the same for Yaridjjs groups. The Educational Testing Service - 
(1966) studied then?X^iuad-nary Scholastic AptitudefTest (PSAT) and the 
Scholastic Aptitude Tes\ CSAT) to find otit whether the test s^cores for • 
Black and White students predicted :>Stlually well to the gra^de point aver- 
ages in all groups. All groups weire in Integrated colleges, and the ^ | 
findings suggested that predictor scores for both group's reacted the same 
way whether placed 'in the common- regression equation of the specific equa- 
tions of, the two groups. ^However, it \liras noted that in one of th^ colleges 
the grade V point average w^s over-^predlcted when the common regression ^ 
equation was" used. . \ ' v 

'Cleary (1966) examined "the predictive bias \betwe en Black and White 



students "in integrated •colleges . Results revealed that there was no 

' ^ * \ ' ^ ' • 

Evidence predictive bias in th§ tests for the Black students. In 

her research, Cleary's definition of bias, has been stated as ^ ^ ' "''P 

,/ .>;■," ^ " . ■ \ • \ 

A* tes*t is. biased far members of a subgroup of the population if, 
: ip the prediction op a Cririterion for Vhich the test is designed, 
consistent nonzero errors of pre4i^ction are made for meiabefrs or 
the subgroup. r \ 



In, other words , if consistent nonz^ero erjrors .were obtained;:^*tfte under or 

over prediction of scores would not be at the disadvantage of ■ either . 

groups involved. This takes for granted the u^e of a single^prediction 

\' 'i) ^ ' • . ' 

equation used for both the majority and minority group. 

* ■ - . s . . - 

Stanley and Porter (1967) found -the predictive validity of , the SAT 
to be about as "correjl^ionally valid" in pi^fedlomitiately Black .colleges 
as it is in predominately White colleges.* They, found the interptetability 
of the test in Black colleges to be restricted, however, because the' 
(iistrfbution of scores displayed a highly skewed Ctirve. 

Greene (1974) reviewed the literature and reported sjtudies which 
contained contrasting viewpoints about predictive bias; that ijs^? the SAT 
^d ACT (American College Te^st) were poor predictors of per f chance among 
Black students who came from segregated southern high schools' anR entered 
^integrated colleges (Clark, 1965), and among Black studegits^ in* predominately 
'White colleges (Bradley, 1^6^. ^ 

Linn. and Werts (1971) discuss the problem of predictive bias differ- 
ently. They state "that the definition of predictive bias requires a 
6ompariSon of regression equations and is not equivalent to a comparison 
of validity coefficients." , They go on to say that "equal validity coef^' 
fici^ts can easily be obtained '"from quite different regression equations . . 
thCTerore given a common regression equation for two or more groups, the ^ 
wlthin-group Validities can be substantially different." Tllese investi- 
gators grant thatj for this definition to b^ operational, there mupt be 
the assumption that the criterion is free of "bias. ^ . 



'icieary (1968) , p. 115. 
Greene (1974), p. 181-^182. 



Comprehensive reviews of the literature on the bias in |j>re^iction* 
models have, been given by Cole (1972) and Flaugher (197'4) '. Cole gives ' \ 
a technical summary of what '^constitutes bias in each of six tno dels based 
on different psychometric assumptions as well as the valued judgments 

* 'involved^in their selection. ^ Flaugher discusses .the definition of bi'^as 
in e^Lch model in non--technickl' summary 'and suggests the mo^i: probabl^ 
compromisB in the use of • one model, given the practical problems of ^ 

^ l6^g^ interpretations ^ud^^^ficy !j>giplement , - ' \^ 

' It^tn Bias may be def ined as the srtudy of those c lusters of items 
• that are particularly eagy 017 difficult for one group when tompared to 
another group. In other words, most studies of item bl^s are concerned 
^with either the quantitative^ or qualitative analysis of item difficulty. ' 
Quantitative item difficulty refers to 4: he enq)hasis on ^k^tk ordSr^ analysis 
where judgments are. made about test bias, through- stati§|itai procedures 
B^reland- (1974) summarizes the oper^ation: / ! ' 

■ ^ • ■ , ^ . .. • - 

, While these studx^'y^te labelled studies of *iteip bias*, they 

^ rarely attempt to analyze sources of deviation for outstanding 

/ items. The attempt has been usually to Hiake somia^ inf^fence^ about^ ^ 

the test as aj whole by "demonstrating the existence or lack of ex- 
istence of i a significant item x group i'nteractiqn^* 

Qualitative itepa difficulty usiiklly refers to non'-technical procedures, 
whereby clusters of items are judged by their culturet-specif ic informa- 
tional content. Spme /Empirical verif|.c%tion of cultiire-specif ic content 
is usually cited. One such empirical sjtxidy was "cited by Ainnsti^ong (1972) , 
where persons from various ethnic groups were asked to judge those test' 
items that were considered biased toward their group; Even though the 
kinds of items selected among groups were very different,' selections of 

biased items within each group were similar. 

' ' ■ , . V ^ 

,The ^quantitative aspect of item difficulty"" analysis can be seen in 

the Educational Testing Ser\fice (1966) stiidy of it^ bias in PS AT 

SAT for Black and White students attending integrated colleges. They 

found no significant "item x race" or "item x socioeconomic statiis" inter-- 



*Breland (1974), p. 4. 



actions within groups. It was concluded th^t items were unbiased. VJhen 
investigators Stanley and Porter (1967) studied item difficulty levels 
in SAT for students iu 'predominately Ifliite and Black colleges,' contrast-- 
ing results were found. For the Black students, item difficulty levels 
^^ere inclined much more toward the lower, end of the scale, prohibiting a 
normal distribution of scores. T\iese results revealed that the difficulty 
level of the SAT was unusually, high for this group of students. 

. Of particular interest in this article was the discussion of item 
difficulty levQl and its relation to the predictive validity of the test 
for the minority group examined. If the difficulty, level of the test 
items is such that subgroup response cannot be subjected to the normal^ 
curve distribution model, at least two problems becg^e evident. Either 
the test items are too difficult for the group or the type of populations 
that are characteristic in 6ome minority group institutions are different; 
and alternative, ^probabilistic models should be investigated to adequately 
interpret tradi4:ional test results . .Cleary and Hilton (1968) studied 
biased test items on PSAT for Black and White students attending integra- 
ted colleges. An ifc^em on the test was considered biased if t^e performance 
on an item by' group members differed more than expected betxi/een groups 
on all other items included in the test. Their conclusion was stated as 
'•pSAT items cannot for all practical purposes be considered biased for 
either race* (White or Blaclc^ or SES within' race" . The phrase "for all' 
practical purposes" seems misleading in that some items were biased accord- 
ing to their definition of item bias. 

Angoff and Ford (1973) examined items on the PSAT from Black and 
White students using correlational analyses to depict 'item difficulties 
, between groups. * They found that some items were unusually difficult for 
Blacks and went a step further to explain the content of the item. They 
stated the areas of difficulties to be x^ith "vocabulary and concepts per- 
taining ta unfamiliar places and oexperiences 

Brel^nd (1974)*" studied the ctos^ stability of te^t items 



Cleary and Hilton (1968), p. 69. 

M. "^'^'Breland (1974)< Tests were: vocabulary, picture-number, reading, 

letter-groups, mathematics and mosaic comparisons. The groups were: 
American Indians, Blacks, Mexican-^Americans , Puerto Ricans, other Latin- 
Americans, Oriental-Americans, 'White Northeastern, White North Central^ 
Xfliite Southern and White Western. , • " • 
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I on six different cognitive teg^s using responses from 10 different groups. 
This data was received trom data already collected by the National Longi- 
tudinal Study of the High School Clase of 1972, a study of the Educational' 
Testing Service funded by the Office of Education. There was no adjust- 
ment mad3 for SES levels, and it was argued that the samples from each 
cultural gr®up had been rahdpmly selected. 

Breland combined a "mechanical and subjective" approadh'to investi- 
gate the instability of -test items within each subgroup. By that term he 
^ meant the adapted procedure used by Angoff and Ford, where the number of 
■ items answered correctly by each group are normalized and to which appro- 
priate delt^ values are assigned. A cross^plot is constructed and nine 
cultural groups are compared to the North Central White group. He discus- 
ses his correlational analyses to involve the "line o^best fit", that is, 
he defines cross-cultl^al unstable items as "those with the most aberrancy 
around the line of best fit for a particular group". 

The test results were not surprising. Vocahjilary items were cons id- * 
ered most unsts&le among groiips, and this was ascribed to the linguistic 
varieties of the groups . JjPhere xvere^^gategories in the mathematics test * 
? ' that were .relatively easy for the groups, while others were especially dif- 

ficult. Breland suggested that questions requiring a knovxledge about num- 
erical relationships in life situations were less difficult. Certain mat*he 
matical problems, such as "dete^ining value of square roots ^of whole niim- 
bers less than ten", x<rere difficult. This coiit:lusion reflected serious 
problems in the attainment of certain basic mathematical learning in the 
schools. Most aptly ,t Breland summarises the findings, as follows: 

* While the cross-cultural stabilities of some item types suggest 

prdblems in test construction, inst^^jilities in other item types 
point to inadequacies in schooling. 

' ;^ The qualitative aspect of item difficulty analysik can, be seen in 

several literature studies which have been concerned more with the content 
of item bias than X\rith the technical aspects of defining item bias. Dis- 

Breland (1974)', p. 20. 
Ibid. J p. 51. 
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cussion found in the investigations bf Breland ([1974) and Angoff and 

.Ford tcl97i) embraced both concepts of defining item bias; "mechanical" 
(statistical analyses) and "subjective" (face-valid content) judgments. 
Other investigations in item bias have included the reasons for response 
choice/ culture-specific informational co^ntent in the test item and 
patterns of abilities reflected in the choice of item response. 

Brlgham (1932) laid the foxmdation to the study or correct A-nrong 

-responses among dis tractors by suggesting that incorrect respoi^ses were 
deliberately chosen Instead of being selected at random. In other words, 

^he provided an alternative method for studying biased Item response. 
Theoretically, Brighams' work with the College Entrance Examinations 
Board Tp?lrovides a framework in which to understand Item response. This 
framework could have* an invaluable impact In understanding item choice 

. In various cultural groups .where total "correct" responses are lower 
than the dominant group, Brlgham states; ^ 

"It is pos^b&.e to show that Items which apparently have hun- 
dreds of possible answers, instead of five^ show certain eharr 
acteristic distributions of answers indicating concentration 
of errors."* ^ * ' ' ^ . 

He goe^ on to point out; . * q . 

". . . that the ultimate facts with which we are dealing are 
answers to questions. It' Is^ not necessary that these answers . , 
be scored or have values attached to them by ?ome tes-ter — the 
answers may be studied in their oim right. 

. . . the detailed study of , answers to test items provides' a 
completely sound and systematic approach to the study of errors . 
and confusions in. thinking." . 

Later on in his tejct, he suggests "that we are nearer the truth in 
conceiving of 'intelligence tests' as measuring the degree of participa- ' 
tion in the group mind . • ••" and that "symbolic manipulations ^re not • 



*Brigham (1932) , p. 43. 
" Ibid. J p. 45. 
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random phenomena but subject to socdal control/* '' 

Several investigators have used error item analyses as indices to 
explain influences, and classify cultural differences; between groups 
(Eells et al. 1951; Lavrrence, 1957). The study of Eells and his ^col- ' 
leagues was priinarily concerned with "intercultural differences among 
XiJhite gtoups" from the point of viex^r o6 content, these authors x<rere 
interested in examining those items x-ahliph revealed: 

V ^ ^ • * • 

I - - . 

o onusually large status differences - ' 
o unusually small status differences ' ' 
o sets of items showing contrasting amounts of status dif f er-^ * 
ence although similar Xi7ith respect to form of ^nbeilsm . 
(•letters, pictures, numbers and type of question) - • 

^ o significant differences betx^een tt^o loi-j status^ groups (o^ 
American and Ethnic) . 

With referenc^ specifically to colxtent, vocabulary items were stressed 
to be most important in dividing the cultural gr«)ups. In s"bmipiary,^ these 
authors state: ' * ' 

Practically all of the item«^vhich shox^j unusuall^ small differ-, 
ences ei1:her are non-verbal in symbolism or are expressed in S 
. Lelatively simple everyday vocabulary and deal vTdpth objects or 
concepts which are probably equally familiar, or equally unfam- 
ilia: to pupils of both status levels. 



t3 



Another finding suggested that "there were A large substantial nimiber of 
items shoi-7ing Tlarge status differences for which no reasonable explanation 
vjas noted."' It vjas advised in this instance that caution should be taken 
"in accepting the idea that all status differences on test items can be 
readily accounted for in terms of the cultural bias of their content." ' 

From a different perspective, Roberts (1970) summarizes an evaluation 
of linguistic item oiases found in four tests tilfeft are used frequently to 
measure language development and abilities in y^tag children: The Peabody 



Ibid., p, 208. 
*''Eeiis (1951), p. 357 . 
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Vocabulary Test, Wechsler P^e-School Primar'y- School fntelligence Test, ^ 
Metropolitan Readiness Test and Illinois T^t of P.sycholinguistic Abil- 
ities. She points out that "substantive bias in st^dardized t:es>ts can 
.be found in culture-specific vocabulary items-, culture-specific pictures, 
culture-specific information questions and even dialect-specif ic ';Ling- 
uistic kiuestions .". , / 

Wolfram (1974> cites examples of cultural bias "In diagnostic tests 
for articulator^ development,' auditory, discrimination, grammaticaj. devel- 
opment and vocabulary acquisition. From a sociolinguistic point of view, 
his concept of task bias embraces comprehension of instructions and in- 
terpretatioAs of an appropriate response set. It also includes specific 
linguistic. item bi^s found in the phonological and lexical differences 
between dialect respons.es and test commands given in Stand«fa= English . 
? Other studies have revealed linguis-tic item-bias in standardized 
tests used in grade school in reading (Meir, 1973) and '6ther subject 
areas (Cicourel et al.,. 1970). 

' '^Lesser, Fifer. and Clark (1964) attempted^ to 'reduce cultural content 
in th^ir stud^ of six and seven year old children from two social classes 
(middle and lower3 and four cultural backgrounds (Chinese, Jews,- Blacks- 

and Puerto Ricans) . ' ' ■ ' * 

Their ""clilture-f air" materials was described to "presuppose only ex- 
periences that are coiWon and' familiar within all of the different social 
class and ethnic groups in an urban area."" One finding in this study was 
that after the item as stimulus was controlled for cultural differences 
betireen groups, paccems of abilities among groups remained different for 
each ethnic group. In addition, the authots state that "once the pattern 
specific to the ethnic group emerges, social class variations tHthin the 
ethnic group do not alter this basic organization". ' v 

The results of the studies appear to be inconclusive in their attempts 
to detect and/or remove cultural bias found in existing tests. . This has ^ 
been so becauseTthe primary objective of the studies has been to eliminate 
differences inl^formance between groups, not considering that these. dif- 
ferences may noj^ be manipulatdblfe' through .technical analysis or change of 




Roberts., p,. rV--13. 
•Lesser, et al., p. 567. 
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content. The oveTOhelming problem seems to be in the inter^jretation of ^ 
these salient differences, so as to make valid inferential statements ' 
about thf test results among groups. To date, this question has not been 
fully explored in the existing literature studies. 

^Test-toJling Bias may be defined as a mismatch betx<reen j;he normative 

expectations of the test designer/examiner and the personality factors 

and learned skl^lls of the test taker. The use of standardized testing to 

measure psychological and educational behaviors is accompanied by a set 

of standardized test-taking behaviors. In other "words,' standardisation 

is not only controlled through externally valued criteria. Briefly, 

'such criteria may include the ability to foilow instructions; the ability 

to work persistently and/or speedily through a series of tasks; and the 

^ .v.- ■ ' ' 

abiljity to manipulate numerical* geometrical and linguistic relationships 

Tiie@e criteria provide the necessary framexijork from which the test con^ 

structor/e^miner must design standardized test-talcing norms x^hich may 

\e at ciMUplete odds x^rith the personality factors and the acquired skills 

of the intended test-taker. 

Several labels have been used in the literature to describe these 

test-taking behaviors. Jensen (1968) .provides the follov/ing summary: 

" 'motivation) ' test anxiety', ' test sophistication' and other test*- 

taking attitudes, 'personal t^po', 'clerical skills' and 'susceptibility 

-k-k 

to distraction'". o 

A fairly large body of research has been done on examiner^ias in 
testing especially x-jhen the race of the exoniiner is different from that 
bf the person being exairdned. (Rosenthal, 1966; Sat^tler, 1970; Epps, 
1974). Some attention has been given to "subject bias" in psychological 
research in general (Lester, 1969), and in the standardized testing sitxia-- 
tion in particular (MacKay, 1970; Roberts, 1970; Wolfram, 1S*74) . 

Rychlak (1973) offers a theoretical learning framework that can 
very useful in explaining some of the traditional assumptions of testing. 
He makes the following case: 



These criteria have been discussed on several occasxons; Jtosen 
(1968), Bricfcpan and Jiehrer(1972) , and Jencks (1972). 

-Jensen (XS^S) , p. 70. . • • ' ^ ' 
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It is pure fiction to assume, as many E's [experimenters] do, 
that S*s [subjects] conceptions of the experimental purpose 
(i.e., design) are 'chance' variations to be cancelled out by 
another S^'s conceptions. A major aspect, of the learning going 
on in all human studies has to do with the informal study ^eing 
conducted by S as to 'what is this all about? ' This is literally 
a controlled dimension amounting to a kind of social role (or 
rule):x^hich enters into the differential variance accounting 
for significance in the eventual statistical tests.* 



i 



' Much of the test-taking bias can be e^iplained from a socio-linguist?ic 
, point of view given^ the fact that one's language use afid styles providfe/ 
sufficient familiarity with^ t^ type of tasks and the pattern of resporjfee 
required in a particular lasting situation. ^lany literature studies have 
documented the "social control" of language use and style in many stand- 
ardized testing situations. ' ' . ' 

Roberts (1970)' states that tlhe "verbal style required by the tes^ 
can be culture specific". She gives the example that the cultural nonas 
for verbal interchange may be very different £tom the ijorms of $he test- 
takers own ^"speech community". 

MacKay (1970) « believes the manipulation of .a subject's test-taking 
Sehavior is. based on at least two assumptions. First, the need of the 

test designer to envision a model in which all of the subjects' actions 
• are predictable. Second, that the subjects' actions can be ^ controlled 
through testing format and procedures. Summarily, MacKay points out that 
testing theories are based on the assumption that the administration of 
'the test will take place "in a non-contextual social setting xirLth a non- 
contex;tual cognitive orientation". 

LUbate, et al. (1973) ^ summarizes the non-intellectual factors that . 
seem to influence achievement testing outcomes as: ,^^elf-concept , moti- . 
vation, le^el of aspiration, attitudes, etc. Eveii though, these factors 
are deserving of research study in their o\m rigHt, barring the inade- 
quacies of theory and methodological procedures, these variables are 
said to be present during the complex testing process, and their measure- 
'ment must be included because they are considered addkional souxpes of 
variation. y ' » 



*Rychlak (1973), p.- 3. 
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^ Brigbam (1932) seems, to be discussing te3t taking bias as a group ' 
phenomena when he used the terms "intrinsic causes of .group factors". 
He believed that "it was possible to^show that group factors may either 

be suppressed or gejierated by e^cperimental .""conditions of testing, such 

■* 

as timing;. . ." He concludes by, stating: 



"There'^may be other Irrelevanfi t^feting conditions set which tend 
tOo alter the results one way or Another. The^ study of these v 
conditions by e2q)erimental vari^ion and control is a most im*^ 
portant problem and one which sl^ould take, precedence over th^ 
mathematical systems of interpretation which -^have now^ gone far 
beyond the test data."^ * 

I 

THE POLimCALt LEGKL MP ADMINISTRildlVE ACTIONS ^ 

. . - / . ' ■ 

Efforts to achieve cultural fairness were reinforced by political, 

legal and administrative actions. . ■ . 

Political %^ion8. The concep^e^f cultur^al fairness in testing w^s 
e*!Ktended from a scientific/academic debate to the political forum as a 
result of numerous writings suggesting national inqu4.ry into the use of 
tests (Hoffmann, 1962, and Black, 1963) and as a result of specific fed- 
eral legislative mandates. (EOA Act pf 1965 and ESEA of 1965) and stiAte 
legislative mandates. 

The "fair use" of standardized tests, in relation to various groups, 
has been unalterably^ associated with the two concepts of "equal educa- 
tional opportunity". and "educational Accountability". The curious 



Brigham (1932), p; 44. - 

Please refer to Clasby, Webster and t^ite (1973) for extensive sum«- 
mary of statfe legislative mandates authprizing the use of tests to assess 
educational programs. 

'Goodlad, J. (1971) distinguishes between the terms "educational op- 
portunity" and "equal educational opportunity" through a historical, social 
and economic coifxtext. He explains these terms in changing contexts o^ edu-^ 
cational history. 

"Educational accountability" can be defined as the demand on various 
fundin'fe , sources to press educational systems for reliable information on 
studerit learning to justify the allocation of resources and educational 
expanses. {Please^ refer to. Tyler (1973) and Webster (1973) for specific 
rationales for thje" renex^ed interest in the need for present-day educational 
accountability jj/ , . 
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connection between the usb of standardized testing and those concepts is 
highlighted to gradual prominence when one. peruses the goals of nation- 
wide and state-wide 'testing programs.' 

There has' been a heavy "federal initiative" i^ sponsoring compensa- 
'tory educational programs for poor and minority groups . The literature 
abounds ^d.th programs, plans and experiments to give manjr such children 
an eq^ial; chance in the educational system. However, compensator?? educa- 
tion prograia objectives, as measured by standardized tests, have only 
revealed short gains that hav6 uot been sustained for long periods of 
time.* The iise of " standarBized tests to assess educational program out- 
comes has had mixed reviews in the literature. It became increasingly ^ 
q^6ar that test scores could not be translated easily into program objec- 
tives, for two' reasons: First, many»of the standardized tests used were 
not I originally designed or"^ intended as evaluative tools; and secondly, 
the/ utility of aggregated" test scores as ^ole indicator of the effective- 
ness of the programs left much to be desired. Z' 

' Millions of dollars are being spent in federally funded research, 
development an^ evaluation projects.. that concern t<e quality of education 
of young, mipority ehildren. The high concentratioft of, minorities ^in 
programs for the .poor h^s highlighted .issues of testing in Dublic debate.; 
Such programs as Head Start in early childhood education and Title pro- 
grams in -elementary and secondary education have been created through the 
° mandates of such legislation as the Economic Opportunity Act (EOA of 1964) 
and Elementary Secondary Education Act (ESEA of 1965) . 

Head Start has a current budget of $400 million, and the Nixon Admin- 
istration suggested that there be a^ 10 percent incre^e in the coming 
fiscal year'. As a result -af . ESEA, Title 1, nearly $1 bi^ion have been 
allotted to schools with concentrations of children from homes in poverty, 
and the Act requires local districts to evaluate ^he effectiveness of 
the educational programs that emerge. With the caveat of ^evaluation of 
the educational programs ad^d as an obligation of .the^uccessful execution 
of federally-funded projects, the need for general guidelines for federal ^ 
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*Plea~^e refer to the extensive studies involving national ^^^1"^*^^°"^ 
,of compensatory educat-ional programs (Cicarelli, 1969; Coleman et al. . 1966, 
Follow-Through Evaluation, 1973). ♦ cr * 



policy In the screening of J:he selection and use of tests seems to be of 
paramount Importance. ^ , 

In many Instances, there has been a concerted effort not Xo equate 
standardized testing with the total design of the^evaluatlon. In spite 
of the fact'that the state-of-the-art of the testing of young children 
generally Is In a fluid state, there, has been a tendency to rely too 
heavily on the results of standardized testing, especially when dealing 
xirLth diverse cultural groups. 

A noteworfeJiy statement was made by Campbell aixd Erlebacker (1970) . ♦ 
as they discuss previous evaluations of compensatory- educational projects. 
They state that "commitment to reality testing (referring to true exper-s 
Iments) on ameliorative programs should Involve acceptance of the fact 
that some programs will turn out to be ineffective." They go on to state 
that "when such outcomes are encountered, the political system should 
seek alternative approaches to solving the same problem, rather than 
abandon all remedial efforts." 

There has been an increasing growth in state legislation authorizing 
state-wide testing programs of schools and school system^. The response 
of the^ States to the primary "federal initiative" has been to introduce 
several versions of. accountab^Llity legislation. * o Webster (1973) records 
and studies approximately 54 pieces of legislatioTi. She states that 34 
were dated in 1971 or 1972, and 12 in 1969^' or 1970. She concludes that 
"over 80% of the legislation was introduced in the past four years." 

The problem with this influx of state-wide testing programs has been 
the undue reliance on test-related informatipn to support policy decisions 
This has been especially noticable with the granting of financial" rex^ards . 
Dyer and Rosenthal (1973) break down this probLeifiinVo four salient ques- 
tions : ^ 



*Campbell and Erlebacker ^1970) p. 203. 

Maureen .Webster and Naomi White (1973) discuss "minimal skills", 
state-wide educational assessment progr^s and changing context of. educa- 
tional policy. * . 

Webster (1973)^, p. 65. . ^ ^ 

^ "^Dyer and Rosenthal (1973), p.. 122. 
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o Does one use the funds to revjard the districts that show up . N 
high on^ the indicators? ^ « 

o 'Does one withhold the f finds to punish the districts that 
show up low on the indicators? * 

o Does one use the funds to help upgrade the districts that 
show up low on the indicators and thereby withhold funds 

If' 

from those that show up high? 
o ar can one of ind a way to allocate the funds so" that all dis-r 
tricts will have an incentive for constantly improving the 
quality of their schools? ^ 

The complexity of this problem reveals the varied emphasis given to 
the role of standardized testing in each state ahd the necessary linkage 
between federal administrative policy on evaluation and the general use 
of testp to evaluate educational objectives at the state level. Since 
the constitutional authority for education li^s-in the domain of each , 
State, it is the responsibility If. each State t'o resolve the question of 
what criteria it will use to Judge equitable education performance, and 
it ,is the responsibility of the State to make sure that the chosen criteria . 
do nt)t systematically discriminate against certain, groups more than others. 

Necessarily, the federal policy-maker will be concerned with policy 
options involving testing alternatives while the state policy-miker must 
reckon with policy analysis and the 'implementation ^f testing objectives. 
Twc trends make a collaborative venture important to both federal and 
.state, administrative agencies. First, more than 75% of current state 
assessment programs rely totally or partially on federal funds . Jhis 
'may be modified in part by revenue-sharing funding proposals in education.. 
Secondly, the necessary distinctions between. "federal educational policy" 
and "national educational policy" "Webster (1973) . suggests: 

The phenomenon of national coalitions has' reached a point where 
it is possible to distinguish, at least conceptually, between 
federal educational policy which gjiides the activity of the- fed- 
eral government and national educational policy positions which 
represent a wide array of concerns of Interest groups and de- 



ci's Ion-makers . 



ERXC Webster (1973), p. 53. 37 
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The latter group will represent an interplay b^t\^een both federal and 

> ■ > ■ 

state assessment activities. It may be x^ithin )this realm that the use of 
testing as tool xd.ll be put in its proper perspective. 

Legal Actions* Existing tests have been documented to have system-* 
atically discriminated against minTnrity and poor children so that they 
appear to perform poorly on a variety of tests under various circumstancjes , 
This documentation can be cited in various class-action suits and court 
decisions (Mercer, 1974a, and Williams, " 1971) . 

Robert L. Williams (1971) provides examples of some of the racially ^ 
discriminatory effects of testing-. They are summarized below: * 

o ... case of Diana et at. vs. Calif orriia State Board of 
Education led tb a .decision in favor o'f a Mexican-American 
child whose intelligence had 'been 'woefully underestimated by 
the Binet ... ^, 
o . . . case of Hobson vs. ffa^^^^^-4n ^Washington, D.C., set an 
early precedent in the decision ordering the track system to 
be abolished since unfair ability tests were used in sorting 
the , children 'into tracks. . . . ^ 

. . the case of Stewart et al. vs. Thillips et aZ.^ charges ^ 
that children are beiiig placed in special classes irrationally 
and unfairly . . j • 

. . case -.of Armstead et al. vs. Mississippi Municipal Sep^ 
' arhte School District et al. involved the use of the GRE for 
^ employment knd retention of Black and White* teachers.^ 

David Kirp (1974) provides a discussion of the sorting of individuals 
by educational institutions that has led to "judicial inquiry". He gives 
particular interest to "exclusion", "ability grouping" and "assignment to 
special education". 

Volumes of filed suits of discriminatory hiring practices because 
of testing can be found in the archives of the Equal Educational Oppor'- 
tunity Commission in Washington, D.C. To date, a summary of this litera- 
ture has not been attempted. 
* - a 
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Administrative Actions. Administfators at the federal and state 
levels have. sought to 'satisfy the above requirement? of political and 
legai juris&ctions , -but in doing so they have found themselves ip the 
dilemma of trying to meet tfie demands of minorities without the necessary 
thepry and data for effective and equitable program implementation. 

One of the demands of minorities has been f 01; a more accurate "label- 
ling and placement 'of minorities. Also to be considered are 'the emotional 

effects on these individuals of such placement, and the genial of future 
educationkl and job opportunities that i^y arise. ^ Again, Robert L. Williams 
(1971) cites examples, as suramariz^ed below: 

o** A document from a group ol Black psychologists review^ by the ^ 

Unified School District of San Francisco illustrated that 
- although Black children comprised only 27.8 percent of the 
total student population in San Francisco Unified Schools, 
they comprised 47 .^Ipercent of all students in educationally 
handicapped classel and 53.3 percent of .all students'in educable 
mentally handicapped classes'. ^ 
o in another instance in St. Louis, during the academic jsear . 

of 1968-1969, Blacks comprised approximately^ 63^. 6 percent of 
. the school population, whereas Whites comprised 36.4 perfcent. 

Of 4,020 children in Special Education, 2.,975 .(76%). vejre " 
' Black; only 1,045 (24%) were White. 



Jane Mercer (1974b) supports the view that .a greater chance ^f "mislabel- 
ling and erroi^eous placement" increases- as one's milieu at home differs 
from the cultur-al milieu of the school. She estimated that Vat least 70 
percent of the child^ren in classes for the educable mentally retarded in 
two southern California school districts were mislabelled as mentally 
ti* 

retarded. 

Testing of Spanish-sumamed children has intensified the debate over 
the discriminatory effects of testing. .Zirkel (1972) describes existing 
literature to .reveal that there are linguistic, cultural and psychological 
difficulties for Spanish-speaking children on standardized tests of 



*Merce,r (1974b), p. 6. 
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ability and achievement.. The' language/ cultuiral. references made in the 
test' content and the 'frustration of translating the subtleties of th^ 
English language- into appropriate Spanish ad'aptatipns are the primary 
variables that have been jfound to be discriminatoi^ against many Spanish 
speaking children perforaiing on well-known standardized tests. . =^ ' " 

Looking back on the early 1960 's, many critics have discussed the ♦ 
various problems involving ^the lack of necessary Instruments , strategies 
or data needed to implement programs relevant^ to ^various ^cultural gr^j^i^s. 
There was a. lack of basic knowledge about the lifestyles and the educa-r 
tional problems of the mSiority groups (Berke and Kirst , ^1972) , Therfe 
was a deficiency in the ih\drpretative framework (existing monocultural *• 
paradigm of testing) which ^could not support the^ Conclusions drawn about 
these various groups. \ Unfortunately , this framework was dependent heavily - 
ofi the results of standardized testing, and often time new programs would 
.show unfavorable results (Fein and Clai?lfc-Stewart , 1972). There was also:^ 
a need to' make immediate decisions about .strategies for the implementation • " 
of program goals before a format Q,r "social experimentation" had been 
empirically verified (Timpane, 1970). Naturally, such a structured 
experimentation would havt& provided, at least in part, the empirical 
base for needed policy decisions. 

Because of the aforementioned reasons, and others ,* testing was 
considered a "dependable" administrative tool which, under the existing 
shortage of information, could provide reliable and valid data about the 
performance of various "cultural groups. In reality, the administrative 
level to which testing is most helpful is debatable. Nevertheless, the 
effects of the interpretdbility of aggregate test scores must be weighed 
* in a bro.ad perspjective. Traditionally, standardized testing has not. 
provided this kind of perspective and various cultural glroups have been 
consider ed^'at a distinct disadvantage when this kind of testing has been 
used. More often than not, the interpretability of the t^t results con- 
tinues to be cbnsiderably influenced by the established cultural norm. ^- 

Klitgaard (1974) p&vides a set of alternatives in the use an^ interr- 

pretation of test measures and statistics that may be of interest to, the 

decision-maker. These alternatives were to demonstrate the tlporetical ^..^ 

■ * 0 tt . * I 

feasibility of interpreting achievement d^ta beyond test score averages 
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to the examination of the "distribution of scoras". The parent study 
involved the. Education Vouehet Demonstration Project which supported a 
variety of objectives including "increased parental, influence and satis- 
faction with schools", "more diversity of educational programs" and 
"ultimately better education". 

It may be said tl^ ii:ultAre-f air testing as a strategy has found 
itself in^ reactionary position; that is, it has attempted to change « 
the existing testing format, content and psychometric operations. At ^ 
best, these attempts have been inaugurated slowly, ©any times with 
discouraging results. Has the concept of cultures-fair testing b'feen » 
doomed a failure? The answer to this question is unclear as one reviews 

•a 

the literature. However, it may be said" that there are certain pedagog- 

ical implications. j, ' 

' „ ■ . ' ■ ■ ' ^' 

THE PEDAGOGICAL IMPLICATIONS 

The application of efforts to reduce Cultural bias has acted as an 
impetus for certain pedagogical implications iii the form of several 
educational testing formats and procedures. These implications have had 
the effect pf minimizing the use of standardized testing, while at the 
■ same time embracing the goals of culture-fair testing through a de- 
emphasis on the use of norms for the. dominant group as the standardized 
reference. At least two of these formats and procedures will be dis- 
cussed in this section. They are: criterion-referenced testing, with 
particular reference to the National Assessment of Educational Progress • ^ 
(NAEP); and the System of^Multi-Cultural Pluralist Assessment (SOMPA) . 

dn^terion-Referenced Testing is by no means netfi^^^e field of 
testing,* but certainly it has become today a" viable alternative to the 
much-debated use of traditional testing by norm-referenced standardized 
* tests. This approach has been seen as a vehicle to reinforce "culfcure- 
fair" testing goals in that group. performances are not compared to a 
"standardization group". ^ Instead, group performances are assessed 



*Airasian and Madaus (1974), p. 78, cites E. L. Thorndike as discussing 
the difference between norm- and criterion-referenced tests m 1913. _ 
**Refer to Airasian. and Madaus (1974), for an exposition of trends i 
r^^9^- leading to the .use of criterion-referenced measures (p. 76-77). 
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through the attainment of "criterion skills"; therefore, they are not 
dependent upon the performance pf previous groups for interpretation. 

One example of national import that has supported the use of 
criterioji-refer&nced exercises is the National Assessment of Educational 
Progress (NAEP)° (1969-1975) The test ins trumentsy rented in this 
program have not been designed as standardized tests -in the traditional 
sense. For instance, tests are not used only to generate scores, but 
are considered exercises that are reported in population group percentages 
finley (1974) distinguishes beti^een the National Assessment program and 
traditional standardised testing programs. Briefly, theg^e differences 
have been described in the following xjays: ' 

o exercises of group versus average performance of students, 

o time is extended to 6 to 8 hours rather than 30 to 70 minutes 
so speed is not necessarily a factor, 

o respotise set includes a wide variety of stimuli instead of. 
only the pencil ar^d paper variety,' 

o exercises are administereci to small groups and interviexfs , ' 
not just total classes, ^ 

o exercises are prepared for high and^lpi*; students, not ju^t 
the average indi]gidual, 

o total scores reflect the number of students who get the 

correct responses instead of the number of correct responses 

hy a particular student and : 

o results are reported by the various exercises used instead 
of in relation to a '^standardisation group". 

System of MulH'-Cultural Plw^aUsHa Assessment (SOMPA) . Another 
challenging educational testing program is being designed by Jan'e Mercer 
and her colleagues at the University of California (Riverside), called 
the System of Multi-Cultural Pluralistic Assesatnent . Even though this 
comprehensive assessment procedure includes "measures of adaptive behavior 
and social role performance in non-academic settings", and "a careful 
screening for physical disabilities", special interest is given here to 



Refer to papers by Ralph Tyler, Carman Finley and George Johnson in 
"Part Five: Assessing The Educational Achievement of Institutions", Tyler 
and Wolf J eds. (1974), Crucial Issues of Testing pp. 91-104. 



Finley (1973), PP. 97-98^2 
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the eephasis on pluralistic Aovm used in this asfsessment syotem. 
Mercer tl974b) repdrted frOB initial data that: 

Wg have tested 2,100 California public school children five 
through eleven years of age ~ 700 l&lack, 700 Chicano /Latin,, 
and 700 Anglo-American ... Altogether, x^e tested children in 
ninety-one different school ^districts and over 150 different, 
schools ... He factor-analysed the forty questions asked the 
aother about the faiaily background and identified nine charac- 
teristics of the child's socialisation milieu which are re^la- 
tively in'depen^dent variables. We •found that five of these 
factors could ""account for 27% of the variance in Verbal IQ, o 
13% of the varianjje in Ferformnce IQ, and 24% of the variilnce 
in Full Scale IQ. 

Even though the traditional standardised tests are being used (1973 
revision of the WISC and a diagnostic instrument, the Bender-Gestalt) , 
the interpretability of the testing scores should be greatly enhanced 
through the use of the pluralistic norms and the*use of background data 
provided by the assessment of the socio-cultural environments of the 
various groups being studied. ^ 



fiercer (1974b), p. 14. 
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PART 11: THE PROPOSITION 

INTRODUCTION 

It is propoGcd that a cross-cultural comparative parsfdigm .bq 
developed 'for use in educational testing for minority groups 3 . 

It^has been shoi^ that the (^evelopftent of the monocultural - 
testing paradigm established an inherent "separateness" of the domin- 
ant. versus the minority groups in the American'^ society. Consider- 
able data, theorising an?i rhetoric have reinforced this conclusion, 
and problems in interpreting the educational performance of these 
separate cultures by the use of standardised tests have ^ot been 
resolved through the monocultural paradigm. ^ 

It may be appropriate to consider another paradigm when dealing 
xi/ith diverse cultural groups. Kuhn (1962) suggests that new para- 
digms are formulated and acknoxfledged when conflicting solutions to 
pressing problems develop; that is, new strategies appear to answer 
more questions than did the previous strategy. Researchers frequently 
return to original postulates and fiypotheses and reexamine them v;hen 
they cannot be justified by empirical • data already collected, and 
such an approach may be warranted in the approach to "culturally- ' 
fair" testing. It may be appropriate, therefore, to consider the 
American experience through a cross-cuituraL paradigm when dealing 
with the subject of testing. V. ^ 

This section recommends the adoption of a cross-cultural ! 
comparative paradigm for testing as a means of enabling policy- 
makers to deal fairly with the "reality" of cultural separation 
or homogeneity among certain groups.'^ Traditionally, the term cross- 
cultural has meant the study of distinct cultures from different ^ 
countries, nations or geographic I0calit4.es. In this paper, the 
term "cross-cultural" will refer to t^e study of different cultural 
groups ifrilthin the national cultural milieu of America. 
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A; DEVELOP IMG A CROSS-CULTURAL COMPARATIVE PARADIGM ^ ^ 

The* idea of a cross-cultural comparative paradigm is considered 
here as having both theoretical and measurement premises^ and this 
section concludes with suggestions on the procedures creating ^ 
the paradigm. < ^ ^ ^ ^ 

THE THEORETICidr^IENTATION 
A cross-cultural comparative paradigm must have as its theoreti- 
cal base th|/assumption that cultural groups can be used as variables. 
.Theoretical statements then can be n^de'^about these groups^ and such 

theoretical statements must conform to the general scientific goals 
.of being^ accurate, parsimonious, general and causal-. 

Support for this position may be found in the work^of Przev^orski * 
and Teune (1970), x^hen they discuss the logic of compar,ative research. 
They assert that' theoretical statements can be made valS^ut social j 
groups or "systems" or "system-level variables", i£ those variables 
ate subat^tuted^fp^he "proper names" of those systems. 

^iJhen discussing social systems they are referring to nations 
and- countries, but they suggest that the principles are applicable 
to research designs or mathematical models dealing with social 
science phetiome\ia (e.g., cultural groups). ^ ^ 

These authors enumerate at least two. problems that are encountered 
xv^hen examining the behavior of variables within systems and at the 
system level. They are: 

(1) "distinguish between 'spurious' and 'true' correl- 
ations v/hen relationships are observed at different 
]^evels" (within or between systems).,- 

(2) "distinguish the effects of the variables observable 
only at the level of systems (diffusion patterns and 
settings) from vaV4-ables aggregated from within- 
system observations (contexts)." 



*Przeworski and Teune (1970) ,-p> 72. 
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In the first instance, there Us some discussion about the gai^s 

and losses in the generali^ty of theoretical ^interpretations of ^ 

relationshij^ between variables when certain statistical methods are 

. used. Several criteria are submitted to' explain the conditions of 

"spuriousness" when within-system regression equations are the same 

or different from total regression equations. Since there is always 

a compromise between theory and measurement, the assumption is made <^ 

in this discussion by the authors "that within-system relationships ^ 

are linear or, in ptHer words, that there are no interactiori effects v 

^ at the individual level.'' It x^ill be seen later in the measurement'^ 

context of this paper that this 'assumption is usually, unfoiinded. 

In the secohd instance, to'^imderstand tihis pr&bleur i;C.--ia<i^t be 

. realized that the authors are interested primarily in t^lxose sjsrstemic 

'factors "th^t may potentially influence or be fijifluenced by wjLthin- 

system behaviors , not with properties of systems as potential/ vari- 

* ■ / ^ * 

ables in system or group-level analyses." Tliey have summarized 

the factors of interest to include: "diff^sioJl patterns", | "settings" 

and "contexts." i K 

The discussion of "contexts" as systemic factors has particular/ 

relevance in this part of the paper because of its emphasis on 

aggregate individual data and the measurement of that datfiu^^J^rzeworski 

and Teune define 'systemic factors as/ contexts, noting that "when the 

characteristics of individuals — whether predispositional, behavioral, 

or relational — are aggregated, the social system of which they are 

t a • 

members acquires a parameter." T\i70 contextual variables are 



I2?ic?. ip. 51. 

** 

Diffusion patterns deiscribe those relationships that may result 
from "historical learning" sometimes referred to as Galton's problem. 
(Refer to Przeworski ^nd Teune (1970, p^ 51-53). > also to the reference 
cited, Naroll, R. , "Galton's Problem: The Logic of Cross-Cultural 
Analysis," Social Research^ 32, 1965.) . 

f Settings are described as "neither diffusional patterns nor 

aggregates of observations," but "... characteristics (historical, 
institutional, exten^al, behavior and physical) to which all indiyl- 
' duals x-zithin a system are, at least potentially, exposed." 

^ (Przeworski and Teune, 1970, p. 53-54). 

' t 

Przev/otski and Teune (1970), p. 56./ ' . 
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distinguishable: those 'made up of "aggregates of relational pro- 
perties" and "aggregated of individual properties." It is the former, 
aggregates of relational properties, thd(^will be of major concern 

* in'the following section on measurement, both from the position of 

■ V 

equivalence and its application to the principles of psychometric 
theory. 

THE MEASUREMENT CONTEXT 

The measurement conCext of aCr-ross-cultural comparative paradigm. 

is foundid on the premise that the pattern of relationships between 

test responses c*i be manipulated so as to make accurate iiiferential 

statements about the psychological traits being sought. Heretofore, 

• •• ' 

it has been customary to manipulate only .the test scores to that end. 

In prder to deal with patterns of relationships-, one must con- 
front both the concept of equivalence and the concept of construct 
validity. 

THE CONCEPT OF EQUIVALENCE 

Equivalence Ts the inference drawn wfien it is found that there 
are parallel factor distribution patterrf^ between groups, betx^een 
sets of test items or* betweew subsets of test items. 

The rationale for this position .is supported by Przeworskl and 
' Teune when they state- that "the criterion for inferring the equi- 
valence of measurement instruments can be found in the structure of 
the indicators" and that the "basic 'datum" in the comparison betx^een 
systems is found in the Vwithin-system relationships." 

It has been demonstrated in Part I of this paper that attempts 
to produce a cultu:ce-fair measurement instrument revealed consider- 
able system interferences. In other x^ords^ efforts x^ere not made to 
determine x^hether parallel statements ^uld be made about factor 
distribution patterns within systems before proceeding to a cross- 
system analysis. As Przeworski and Te^ne point out, the comparison 
of relationships' x^ithin systems revealing the behavior of items 
within that system is. more indicative of system interference than 
Q is the aggregate of system scores. 

1 '47 I , . • 
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"Another assumption that must be highlighted in the concept of 
^equivalence of these authors is that direct measurements of phenomena 
are accupate. However, this assumption is questioned in greater 
detail in the following section on construct validity, where it is 
shown that indirect measurements must be used; and even then, it 
can only be said that if indirect measurements can be found to be 
accurate, then it is feasible to proceed toward statements of equi- 
valence. ■ , ^ 

THE CONCEPT OF CONSTRUCT VALIDITY " ' 

Before the notion of ^equivalence can be suggested as a viable 
alternative in the measurement of cultural groups, there must be 
some attempt to relate this term to the principles of traditional 

r 

test theory and operations. This association between equivalence 
and psychometric testing is imperative since both seek toT examine 
"patterns of differences" in group responses. 

Thi^ section of the paper x^ill be interested primarily in the 
logic of test construction and test operations, and Loevinger (1967) 
provides an explanation in this regard when discussing objective 
tests and their role as instruments of psychological the^ory. She 
extends the meaning of construct validity to include some of the 
crucial criteria needed to provide ^ psychometric foundation for 
the operationalization of equivalence. 

Loevinger examines the roles of the two primary concepts of 
psychometric test theory, reliability and validity, concentrating 
on the latter concept as the one that can impart the greatejst con- 
tributions to psychometric and psychological theory development. 
She criticizes the classical definition of validity of being "•••f too 
vague, too remote* from actual measuring operations, to be useful..." 
Yet, she^ concedes that it is this definition, the extent to which 
a test measures x^hat it is supposed to measure, that is most used 
in the psychometric tradition. In^^ief, she contends that pre- 



dictive, concurrent and content validities are essentially ad hoc^y 
\ and. that "...construct validity is the whole of validity from a 

scientific point of viexij." 

ERiC - 48 ^ ^ 
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Loevinger believes that construct validity has two meanings: - 

o That the test measures something systoffibtlcally^ 
o That there should be evidence of the particular 
interpretation of what it measures. 

In other words, one can describe these meanings to be defined 
in the first instance as the "intrinsic validity of the test" and, 
in the second Instance, the "validity of interpretation." She 
c^siders the fifst meaning to Include "the degree of internal 
structure of the Items^ and the magnitude of external correlations" 
'(psychometric criteria). The latter meaning Includes "the nature 
of the structure, content of Items, and the nature of the external 
relations" (psychological criteria). 

Loevinger conceives of construct validity as made up of three 
components : . 

o The "substantive component of- validity Is the extent 
to which the content of^ the Items Included In (and 
^ excluded from?) the test can be accounted for In 

terms of the trait beloved to be measured and the 
context of measurement:^ Context Includes psycholo- 
gical theory and. In particular, the psychology of 
objective test behavior." 
* o The "structural component of validity refers to the 
extent to which structural relations between test 
Items parallel the structural relations of other 
manifestations of the trait being measured." 
o The external component of validity "... concerns 
correlation with total score. The method of 
constructing a total score from the Item, pattern 
necessarily Implies a commitme'nt about the 
structure of the Items, and thus about the 



Loevinger (1967),, p. 97.^ 
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str^cture^of the trait measured. That Isj.in a 
» cumulatl/ve test, wh^re the total score Is the r 
number of scoi^ed plus, an additive model is 
implied • • • " ^ 

Loevinger's explication of the 'components of construct validity 
has been closely allied with the stages of test construction, and 
this section. applies her concepts of structural and external validity 
to the notion of equivalence. 

In the discussion about the structural component of construct 
validity, it was noted that traditional psychometric theory for the 
most part, has been preoccupfed with the efficacy^f total scores 
almost to the total exclusion of analysis of individual item respQnse 
clusters. Therefore, it can be. readily concluded that the assumptions 
concerning the structural relations of responses are not routinely 
validated bet\i7een groups. The use of structurally equivalent 
measurements xi/ould seem to be indicated as important at this stage 
of test construction and development. 

In the discussion concerning the external component of constiTuct 
validity, the "non-test criterion," against which the test must 
inevitably be judged, seems to encourage the formulation of hypo- 
theses about relationships between groups. Predictions about these 
relationships may have to be adjusted in view of the structural 
relations found within and between group responses. Even though 
Loevinger describes' the use of factor analysis in this^ area, serious 
judgmental decisions ma^have to be made about the empirical criteria 
to v^hich these tests predict. The concept of equivalence reveals the 
need to examine the ways in which empiri<:al criteria are established 
to maximize test predictive validity, and vice versa. 

PROCEDURE 

The procedure for creating a paradigm for use in cross-cultural 
comparative analyses is demonstrated in the following six steps: 
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FIRST:' Select populations to be sampled. 

As Przeyorski and Teune point out the populations to be sampled 
in comparative studies should be taken f rom ^'natural" groupings, 
based on societies, economics, politics or culture. It has been 
demonstrated that "natural" groupings in the American society would 
be the minority groups which are the concern of this paper. However, 
these authors point out that one must be assured that the character- 
istics of persons selected are sufficiently random to make an 

adequate sample. 

It should be noted that traditional testing techniques have 
been very successf ul„ in identifying which groups should be sampled. 
Przeworski and Teune suggest that several studiea_^f group character- 
istics are available.** These studies have delineated some char-, 
acteristics that may be appropriate for further examination in the 
context of specific cultural groups. ^ ^ 

SECOND: Select behaUoral constructs to be sampled within each 
cultural group. 

Behavioral constructs are those stimuli or variables which are 
used in the construction of test items and are thus operationally ; 
defined by the test designer. ' 

' Sears (1961) was concerned about the problem of conceptual 
equivalence and the need to find transcultural variables. • Even 
though he was writing in a broad cross-cultural sense (cultures from 
different countries), he makes relevant statement^ that could apply 
•^^in this situation. Sears points out the necessity for what he 
callsr "transcultural" variables and identifies the criterion 
essenti^l,for their selection, as follows: "They must- be measur- 
able in whatever culture is chosen, whether the culture ^e a unit 
of the^ sample population or a source of systematic variation of 
an interaction variable.""^ Recognizing that the criteria to be 



"Przeworski and Teune, op. cit. 3 p. 57. 
**Lazersfeld arid Rosenberg, The Language of Social Research, 
Free Press, Glencoe, 111. 1955 (Section IV) . Also cited in the ^^sarne 
•reference is R. Qattell's work "Types of Group Characteristics. 

•■"■Sears (1961), p. 446. 
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used in developing' "conceptual equivalence" are not clearly under- 
stood, he suggests that in defining these criteria the problems ' ' 
will probably be no different at "the cross-cultural level than at 
the "inter-individual" level. 

Suitable criteria that may be used in the sampling of ^ behavioral 

4 

cons |:ructs can be derived by the factor analytic method. This method 
has been characterized as a tool for the study of- construct validity 
and more ^specifically, for. the testing of hypotheses about relation- 
ships among known variables. 

Kerlinger (1964) discusses this method as it has been used in 
psychological and educational research efforts. He makes the cape 
that little is known slbout the construct of achievement , and that 
because in many respects standardized achievement tests are "factor- 
ially complex", users should be particularly alert to question 
their construct validity. 

Loevinger (1967) discusses a problem that occurs at this stag^ 
of test construction: * ^ 



". . < the more one objectifies the nature of the universe 
from which the sample of items is to be drawn, the-less 
likely is the univers^^ to. represent exactly the trait which 
the investigatot wishes to measute. Moreover, for any^ 
given trait name, two investigators would not necessarily 
specify the same objective domain for which to draw a 
sample, nor the same method of sampling."^ . 



THIRD: Establish criteria for the method of sampling behavioral 
constructs. 

Literature is available which is devoted to the study of the 
methods of sampling test respongies. Such literature Is devoted 
primarily to the various approaches used to get the most reliable 
and valid responses from children. 



Sears op. oit. ^ p. 453: 
Kerl/nger (1964), f- 681. 
/Loevinger (1967) p. 93. 
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The psychomet;ric tradition, which is preserved in many current 
standardized t^t est s of achievement, emphasizes one correct answer. 
Kknii (1971) reveals that "these practices reflect an. additive view 
of knpwledge* and a philo.sophy of education that values the child* S^' 
ability to give correct answers.". The "additive view" restricts the 
probable uses of a response to a single correct response, not taking 
into Neons iderat ion that other cultural groups may interpret other 
uses ais being more appropriate for the same response. ^ 
/m alternative approach 'which consumes more time in testing ^ 
is call|?d the exploratory method. This method places emphasis on 
the process by which the answer is given instead of the end pr^oduct 
of the response. \ The significance of this approach for various 
cultural groups is that it allows one to have more information with 
which to evaluate cultural group responses because it answers the 
question of "why" certain responses appeared. Thus, it becomes 
easier for the test designer to approach *the eventual goal of equi- 
valent statements. - . ■ j 
FOURTH: Hypothesize positive a^jtegative relationships between 

several related behavioral aor^^uats within each culture (^Pidbles 

are operationally defined). 

Because of the problems of conceptual equivalence and .operational 

definitions discussed by Sears, there is a definite need for a great 

deal of testing within cultures before engaging in comparative study 

between cultures. - 

Several theoretical variable^ which have been used to demonstrate 

performances between groups, as- yet, have not been defined sufficiently 

within groups. - ' , • 

Sears, referring to this difficulty of €he interchangeability of 
Sehavioral indices, uses this graphic diagram to , make his point: 



to 



X X and Y Y relationships must be 



examined carefully before X Y relationships are sought. 
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Kamii (1971), p. 340.' , 
Sears (1961), op. dt.^ p. 447^.. 



At this point, o,ur processes overlap with test theory and test 
methodological procedures, where thfe manipulation of test item^ is 
b^sed on \xt investigator* s assumptions about reliability and validity. 

FIFTH: Identify wide rcmges of structural relationships between 
test items. ' ./ ^ ,1 



Przeworski.and Teune«^ demonstrate the mathematical assumptions 
which can be applied to equivalence. Their assumptions are applied 
below to the problem of measuring educational achlevemesnt among two 
cultural groups f in ^his case. Black and White third grade students. 

1. There are a number of items X^, X2 that can 

be us6d to measure achievement •'in each cultural ' ' 
group. The assumption that has not been empirically . 
verified is that these achievement items have the 

♦ same 'factorial structure between groups and there- 
fore, any subset of these items are also similar. 
Therefore, the following assumption can be made. 

2. A set of item X^^ is common to all groups., 

3. For each cultural group Cj^, t.here is a set of items 

that is specific to the given cultural context 
of behavioral responses. 

Given statements (2) and (3), one may .conclude that: 

4. For each cultural group Cj^, there is a §'et of test 
items X that is coniposed of subsets Xj^ and X^ 

5. If items X^, X2, X^'. X^ are highly correlated 
with each other,"=^e set of X, is considered 
homogenous. (This is the! likely objective ^of :most 

.test items found in traditional^ objective tests.) 

6. However, little eiijiPlrical data can b^ found on the 

I ^' 0 • , , , . . . 

homogeneity between subsets of items between groups. 



Przeworski and Teune, "Equivalence in Cross-National Research," 
The Public Opinion Quarterly^ Vol. XXX, 1966 (p. 551-568). 
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In other words, for each cultural group, little is 
known about the intercorrelations of subsets X^^ and 
* j^, in terms of their structural similaritif. 

In. descriptive comparisons between grou^, there has been a • 
greater Emphasis on examining the similarity of means and standard 
deviations. From these scores, inference^ have been made about 
equivalency of measurement- statements. As Przex^rorski and Teune 
would agree, these judgments have been supported more on the 
erapiricai assumptions about the behaviofv of groups, rather than the 
actual behavior of the test items within \nd between these groups. 
By examining « the actual behavior of the test items, one is then 
free to approach statements about parallel f^pctot distribution 
patterns. ' o 

SIXTH: Establish equivalent measurement statements betweSi i^o or 
more cultural groups. , 

When the following "conditions are met, it is possible to make 
equivalent measurement stfStements. 

(1) If the analyses show total invariance in the 
structural patterns of test scores or relation- 
ships across cultures, one is able to infer 
general statement about behavior; 

(2) If the analyses reveal partial variance, on^ ^ 
is able to infer general statements only from 

■ . those relationships that are invariant; 

(3) If the analyses reveal total variance^ one can 
i mdke general statements only about each cul- 

i ture, but not aetoss cultural groups. 

' If eiT:her conditions (2) or (3), above, are met, it cannot be 
said that equivalence has been established acros^ groups. 

*Refer to the xi7ork of: Przex^rorski and Teune (1970) and 
Triandis (1972). . ' . 
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As can be seen from the above sections, the paradigm* may be 
applied to the tasks of either making inferential measurement state- 
^. ments from existing i^sts or serve as a guid^ino the construction 
of new tests. 

Traditionally, cross-cultural studies have been concerned in 
. part with the degree to x^hich factorial structure is stable among 
various cultural groups. The question\ecomes: l^y have traditional 
psychometric practices ignored the lack of empirical psychometric data 
on the generality of factors between different social groups? The 
assumption about the similarity of factor structure between and 
within item clusters is subject to empirical verification; and, 
until this line of research is' exhausted, inferential statements'- 
about the performance of varioi:^ groups may be inaccurate. 

Several investigations have documented a list of factors that 
can produce ^if f erences ' in the structure of abilities:* linguistic 
systems, genetics, environmental demands and mode o,f life of sub- 
jects. It should be noted that these categories come close to 
describing the categories found in the literature on culturfe-fair 
testing." ^ . * . 

It is argued/ in this paper that the reason for the slox^r pro- 
gress in the application of psychometric principles to the structural 
relationships of group responses, has been almost total reliance 
of the testing process to a monocultural interpretation. Given a 

cross-cultural paradi^tn of testing, the equivalence of cultural 

ft 

groups becoaies tljie fundamental postulate. That postulate does not ^ 
imply that no differences should exist betx^een groups; but it does 
imply that, when measurement statements are compared across groups, 
and xijhen these statements depi\^ patterns of differences betv/een 
groups J thejhf should have equivalent meanijig . ^ 



th^ 



B; POLICY IMPACT OF A PARADIGM SHIFT 

Bfcause policymakers in education are under the control of the 
executive branched of federal and state governments, they are 



it 

Refer to Guthrie (1967), p. 458, 
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required to view their adndniGtrativG decioiono from a political 
. baoe. Therefore, the isoue of Delecting a paradigm for cultures- 
fair teotiBg, as on additional alternative by which to viex^ teoting^ 
can be viewed also froc^ a political perspective. <^ 
The political problem facing the decision-maker is: How to 
Eeet the deo^ndo of both^ ndnoritieo and the general public while 
reoponding to legislative mandates. Maintenance. of this delicate 
balance can be achieved, in part, through the applicatibn of a 
G-cultural comparative paradigm for testing. 
A cross-cultural paradigm would have the effect of providing 
some solutions to the dileEnaas facing policymakers, as' defined in 
Fart I, because the recomnended paradigm: 

o Suggests a cross-cultural strategy and the establish- ^ 
^ ' ment of equivalent instruments, permitting the inter- 
pretation of valid data within cultures and reliable 
data across cultural groups, 
o Reinforces cultural values among groups by identifying 
behavior sampling techniques withi"^ subject cultural 
groups as well as amdng cultural groups, 
o Provides a testing aUternStive in which federal and 
state decision makers can eiipand their educational 
premises on which they formulate policy and legis- 
lation, 

\ " - 

The evaluation of the effectiveness of federal compensatory 
educational programs must rely in part, on the results of standardized 
testing. There is need for som^ federal policy in the selection and 
use of tests, especially in programs funded under ESEA Title X, 
and programs under the EOA Act of 1965. 

Currently, many agencies of the Federal government are in the 
process of developing uniform guidelines for more effective- r(Sgu- 
lation of employee testing. 

VJhile the actions of these agencies do not have a direct bearing 
on the problems Of testing in the context of education, they re- 
O present the most recent federal initiative in the investigation of 
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tG8t abuQG, The task ahead ^ thereforG, ia to escplore strategies ^ 
that d:ould be applied in t\i<^ evaluation of the selection and use 
of in* early 'childhood and elementary education. 

One such effor^ could involve the^cross-cultural alternative 
presented in this paper. With needed einpirica^. verification, this 
proposition! could be used in the creation of preliiainary process, 
Eethod or technique that would aid in the effective screening of 
tests proposed in lainority-rGlated research, development and 
evaluation projects. 
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